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1. INTRODUCTION 

As an essential tool for statistical process control (SPC), the control chart plays a vital role in the 
quality control of various production processes. The principle of control chart pattern recognition is to identify 
the state of process if working under the standard condition and be stable under the quality characteristic, or 
the process deteriorates from stable to unstable. There are many research aspects involving CCPs to improve 
production quality [1]. The control chart patterns (CCPs) with artificial neural networks (ANNs) can recognize 
the quality in many sectors, such as the health, food industry, and financial market [2]. Many unstable patterns 
exist, but most previous studies focused on the five common abnormal types studied in addition to the normal 
pattern, its cycle, increase trend, decrease trend, upper shift, and downward shift [3]-[8] among others. 

The input data representation depends on the dimensional of data, even its raw data or features-based. 
The features-based has better accuracy than raw data and improve performance by removing redundant and 
irrelevant features [9], [10]. Many features which extracted from raw data from the previous studies' 
investigation such as wavelet features [11], shape features [7], and statistical features [5]. Some researchers 
used mixed (statistical and shape) features to improve the performance [12]. They reported good accuracy 
when using mixed features [13], [14]. 
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Many feature selection algorithms are being applied to get good data representation and remove the 
reiterated features like a genetic algorithm (GA) [13], principal component analysis (PCA) [14], and kernel 
entropy component analysis (KECA) as a feature reduction algorithm [15]. Feature selection plays a critical 
role in monitoring the control charts and pattern recognition, driven by decreasing input dimensionality in 
target problems and growing interest in advanced. 

The Relief algorithm are a distinctive family of filter-style feature selection algorithms that have 
gained popularity by finding an appropriate balance between these goals while reacting flexibly to diverse data 
variables. The feature selection process reduces the dimensionality of data sets in classification and regression 
tasks. It eliminates redundant and irrelevant characteristics that have little influence on the model's aim. This 
approach can minimize computing costs and training time, reduce overfitting, reduce complexity, and enhance 
acceptable accuracy based on minimum features. The Relief method is a common and straightforward instance- 
based learning technique [16], [17]. The core idea behind RAs is to estimate the quality of features based on 
how well the attribute can distinguish between instances that are near to each other. The correlation coefficient 
quantifies the degree of the linear relationship between variables, and has an exact value less than or equal to 
one. The larger the degree of linear correlation between the two variables, the higher the exact value of the 
correlation coefficient is to 1. As a result, the absolute value of the correlation coefficient between features 
should be used to assess the significance of duplicated features. Utilizing the correlation coefficient's absolute 
value as the weight of a redundant element, use the minimization principle in the assessment criteria to choose 
the feature subset with the most classification information, enhancing classification accuracy [18]. The Fisher 
score is a feature selection method, it is used to reduce dimensions by selecting the best features. With selected 
m features, X € Raxn input data matrix reduces to Z € Rmnxn [19]. The top m-ranked features are selected by 
calculated Fisher score for each feature. The reduced dataset will be the sub-dataset containing the most 
significant m features in the original dataset consisting of m features [20]. The Fisher Z transformation is a 
formula we can use to transform Pearson’s correlation coefficient (r) into a value (zr) that can be used to 
calculate a confidence interval for Pearson's correlation coefficient. Fisher's Z calculates confidence intervals 
for both r and correlation differences. However, it is typically employed to examine the significance of a 
difference in two correlation coefficients, rl, and r2, from sample data [21]. 

Most previous studies depend in their studies on one feature selection algorithm. So, in this paper, we 
proposed an improved approach to select the best feature sets extracted from raw data by employing three 
feature selection algorithms (Relief, correlation, and Fisher) with an ANN-based classifier to detect a small 
change (+1.5 sigma) in the unstable process. The significance of this strategy is that it strengthens feature 
selection decision making by relying on three selection algorithms rather than individual selection algorithms 
to reduce wrong selection of redundant features. The paper is organized into four sections. Section 2 provides 
the methodology; section 3 presents the results and discussion, and section 4 concludes the paper. 


2. METHOD 
2.1. Data generation 

In this paper, we focus on six common patterns that studies in previous studies (normal, cycle, increase 
trend, decrease trend, upper shift, and downward shift) as in the previous studies [5], [11], [22]. Using Monti 
Carlo simulation rather than real data because it’s not economy as previous studies to generate all type of 
patterns. A total of 6,000 X-bar chart patterns were generated (1,000 patterns for each type) using (1) to (6). 


Normal (NOR)y_t =ut+ria (1) 
Cyclic (CYC) yt = ut+riot+a(sin(2ni /T) ) (2) 
Increase trend (IT) y, =u+ro+ gi (3) 
Decrease trend (DT) y,=u+no- gi (4) 
Upward shift (US) y, =u+roF ks (5) 
Downward shift (DS) y, =u+ro0+ks (6) 


The parameters for the equations which used in previous studies are given in Table 1. In this study 
the authors carefully selected the parameters to be suitable with previous research as shown in Table 2. 


An improved features selection approach for control chart patterns recognition (Waseem Alwan) 


736 im) ISSN: 2502-4752 


Table 1. The parameters and values of the equations utilized in previous literature 
Ref. Normal Stratification Systematic Cyclic Increase Trend Decrease Trend Upward Shift Downward Shift RA 


Pram. Mean Random Systematic Amplitude Gradient Gradient magnitude magnitude (%) 
Std noise departure Period position position 
[9] p=0 1/3(0) a=0.56-2.56 0.0156-0.0256 (-0.025-0.015)o 0.7 5to2.56 -2.56--0.76 95.2 
o= p= 10 
[23] p=80 0.26 <0 0.26 lo<g<30 l5o<a<250 0.056<¢<0.1lo —O0.1lo<g<-0.050 l5o<s<2.50 -256<s<-l.5o 99.6 
= 8<p< 16 15<P<45 15 <P <45 
[24] u=0 lo<g<30 1l5o<a<256 0.056<g<0.16-0.16<g<-0.056 lS5o0<s<256 -256<s<-15 <9 
o= 4<p<8 ll<p<2l ll<p<2l 
[3] p=30 1.50<a<4o [0.1 0, 0.3 o] [-0.3 6, -0.1o] [1.50, 30], 1.50, 30], 99.3 
o=0.05 4sp<8 p= [4-9] p= [4-9] 
[8] p=0 (0.1-0.4) 6 (0.005-2.5)o  0sa<1.8 0.015-0.025 0.015-0.025 s=0.6-2.5 s=0.6-2.5 99.8 
o= p=10 
[25] u=0 a= [1,3] 0 [0.1,0.3] o [-0.3- -0.1] o s= [lo, 30] s= [-30, -lo] 98 
o= p=8 P= 12,13,14 p= 12,13,14 


Table 2. The parameters and values utilized for the six CCPs 


Parameters Definition Value 

Lu Mean. 0 

oO Standard deviation. 1 

a Random noise all for each abnormal pattern. o' =1/36 

a Amplitude. 0.50<a<2.56 

T Period of cycle. 8, 10 

s Shift magnitude. normal shift 1.50 < s < 2.80, small shift s <1.50 

k Shift position. position = (5,15,20), k = 1 if i= position else k = 
0 

g Gradient for a trend pattern. 0.0156 <d< 0.0250 

ri At the ith time point, a random. value of a standard normal 3<r<+3 

variate 
yi Time series value at ith time point 1-30 


Standardized: N (0,1) 


2.2. Features extraction 

The feature extraction from raw data decreases the dimensional input for machine learning, then 
increases the recognition efficiency when making the network size small [15]. Two common features are 
working correctly with CCPs statistical features and shape features. We focus on the common 13 mixed 
(statistical and Shape) features used with CCPs [12]. 


2.2.1. Statistical features 

The 10 candidate statistical features examined in this research (mean, standard deviation, maximum 
value, minimum value, skewness, Kurtosis, mean square error, slop, mean square value, and cumulative 
summation) as shown in Figure |. Its chosen based on previous studies [8], [10], [26]. The mean feature for 
normal and cycle is approximately equal to zero when for increase trend and upper shift have value more than 
zero. Moreover, decrease trend and down shift have a value less than zero. This feature can differentiate the 
normal pattern and cycle from other patterns, as shown in Figure |(a). The standard deviation feature can 
differentiate between normal and other patterns, as shown in Figure 1(b). The maximum value is higher for the 
increase trend, and upper shift pattern, less for normal and cycle, and minimum for the decrease trend and down 
shift, as shown in Figure 1(c). The minimum value is higher for the decrease trend and down shift pattern and 
less than for normal and cycle and minimum for increase trend and upper shift as shown in Figure 1(d). 
Skewness is higher for trend and normal and minimum for cycle and shift, as shown in Figure 1(e). Kurtosis is 
higher for increase trend and upper shift from other patterns, as shown in Figure 1(f). The mean square value 
is minimum for normal and higher for other patterns, as shown in Figure 1(g). The slope has a higher positive 
value for the increase trend and upper shift, approximately equal to zero for normal and cycle, and a higher 
negative for the decrease trend and down shift, as shown in in Figure 1(h). The mean square error is higher for 
shift and cycle than the other patterns shown in in Figure 1(i). Cumulative summation has a higher value for 
increase trend and upper shift and less than for cycle and minimum for normal and decrease trend and down 
shift as shown in in Figure 1(j). 


2.2.2. Shape features 


This study selected three typical shape features (Least square slop, APML, and APLS) [27] as shown in 
Figure 2. The least-square slope has a higher value for trend and shift and a minimum for normal and cycle, as 
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shown in Figure 2(a). APML has a higher value with trend and shift and minimum value for normal and cycle, as 
shown in Figure 2(b). APLS has a higher positive value for increase trend and upper shift and minimum value for 
normal and cycle and a higher negative value for decrease trend and down shift pattern, as shown in Figure 2(c). 
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Figure 1. Statistical features for each pattern (a) mean, (b) stander deviation, (c) maximum, (d) minimum, 
(e) skewness, (f) Kurtosis, (g) mean square value, (h) slope, (i) mean square error, and 
(j) cumulative summation 
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Figure 2. Shape features for each pattern (a) least-square slope, (b) APML, and (c) APLS 


2.3. Features selection 

Machine learning approaches were utilized to build control chart pattern recognition (CCPR) models 
that differentiate and identify different patterns class. These models' accuracy is determined by the features used 
and the quality of information collected. In general, not all of the features utilized in pattern recognition are 
efficient at distinguishing the patterns. Some features could be redundant or useless and maybe lower the accuracy 
of the classifier. As a result, feature selection is critical in implementing machine learning techniques [12]. For 
that, filtration needs to make the features as soon as possible and represent the data. For this purpose, we employ 
three feature algorithms to select the vital feature that can present the type of pattern by minimum features. 

These three algorithms were used in previous studies on many felids. However, this first time will use 
together with CCPs as a new approach to selected features it is (relief algorithm, correlation coefficient, and 
fisher coefficient) [28], [29]. Then select the best six features in all three algorithms and depend on them as an 
input for the network. 


2.4. Pattern recognizer design 

Many classification approaches are available in the literature, such as ANN, SVM, and decision tree 
(DT) [30]. The multilayer perceptron’s (MLPs) architecture was employed as a recognizer as it has been 
applied to solve more complex problems, such as prediction and modelling in CCPs [7], [31]-[33]. It has three 
layers; the first layer is the input layer representing the input in this paper, corresponding to the number of 
features (6). The second layer is called the hidden layer; its sets one hidden layer; the number of nodes in the 
hidden layer is set empirically equal to 12. The third layer is the output layer corresponding to the number of 
patterns we study equally (6). Then the network (6x12x6). Levenberg-Marquardt (trainlm) algorithm was 
employed as a learning algorithm after testing other learning algorithms. Broyden—Fletcher—Goldfarb—Shanno 
(BFGS) quasi-Newton gradient descent with momentum and adjustable learning rate (traingdx), (trainbfg). 

The first step is data generation of 6,000 patterns, 1,000 patterns for each pattern depending on (1) to 
(6), extraction of the 13 mix-features from raw data then select the best 6 features depending on three 
algorithms results (Relief, correlation, and Fisher). After selecting features, we must make (normalization) to 
those features between (-1, +1) using (7). This process is better to make the network work well and get the 
generalization. During the training phase, the data must be labeled for each class of patterns, the targeted values 
for the recognizers’ output nodes for the correct class will label as 0.9, and another wrong class will be labeled 
as 0.1, as shown in Table 3. This dataset was divided into training (70%), validation (10%), and preliminary 
testing (20%) before the sample data were presented to the ANN for the learning process. 4,200 patterns for 
training and used for updating the network weights and biases. 600 patterns were used for validation, and 1,200 
patterns were tested, and these patterns like hidden during the training phase. The parameters and training 
specifications of the network are set as the number of epochs between showing the progress. 
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Where: P=observed feature value. P,=normalized observed feature value. Pnix=minimum feature value for the 


feature. Pingx=maximum feature value for the feature. 


Table 3. Recognizer output that are targeted [9] 


Pattern’s class Definition 1 2 3 4 5 


1 NOR 09 O01 0.1 0.1 0.1 
2s CYC 0.1 09 O01 0.1 0.1 
3 IT 0.1 0.1 09 O01 0.1 
4 DT 0.1 0.1 0.1 09 O01 
5 US 0.1 0.1 0.1 0.1 0.9 
6 DS 0.1 0.1 0.1 0.1 0.1 


The flow chart for the training phase and testing is shown in Figure 3. Figure 3(a) show the training 
phase and Figure 3(b) show the testing phase, respectively. The maximum number of epochs=500. The learning 
rate is set as 0.5. Momentum constant set as 0.5. Performance measurement (MSE). Performance goal set as 
10-3. All the procedures were coded in MATLAB R2017a using its ANN toolbox. 

To evaluate the efficiency of the RCF-ANN approach, we test the network performance with 13 
common extraction features that extraction from raw data as input and the six features after applying RCF and 
comparing the results. The confusion matrix is the evaluation way to see the correct recognition. 
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Figure 3. The study flow chart for (a) training and (b) testing 
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3. RESULTS AND DISCUSSION 
3.1. Results of relief algorithm 

Kira and Rendell Kononenko developed the Relief algorithm, which is based on instant-based learning, 
and was later improved by [34]. Relief algorithm is quite effective in estimating features. It chooses the traits 
that are relevant to the goal. It finds the nearest neighbor by randomly selecting a case from the data. The 
feature values of the nearest neighbors are compared to the sampled instance to update the relevance score for 
each feature. Relief looks for two nearest neighbors for a given instance: one from the same class (nearest hit) 
and one from a different class (called nearest miss) [34]. Relief's estimate W [A] of feature (A) is an 
approximation of the following probability difference: W [A]=P (different value of the closest instance from a 
different class)-P (different value of the closest instance from the same class). The reasoning is that a good 
characteristic should distinguish between examples of various classes while having the same value for instances 
of the same class [28]. This features selection algorithm calculates two factors the rank of features and the 
weight as shown in Figure 4. Most previous researchers depend on the weight because it gives the ideal 
presentation of the nearest features from the target. 

In this paper, we depend on weight to select the best feature. There are two parameters in this algorithm 
that must to known. In this study, the initial weight for all features=0.2 [28]. Moreover, the number of valid 
features equals six. Figure 4(a) shows the output of the ranked features in the Relief algorithm. The algorithm 
arranged all the features depending on rank. In this experimental study, the features slope ranks in the height 
position, and its weight is 0.2961. the weight value of 0.2069 represents the median value of the Relief results, 
as shown in Figure 4(b). For that, the features with a weight above the 0.2069 value are considered the selected 
features. Hence the features slope, maximum, mean, least square slope (LS.S), cumulative summation 
(Cum.Sum), minimum (Min), standard deviation (St.D), mean square error (MSE), skewness, and area between 
the pattern and its least-square line (APLS) are selected as independent features in the output of the relief 
algorithm. In order to ensure the features, the next step is to apply the correlation algorithm. 
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Figure 4. Relief features results depend on (a) rank and (b) weight 


3.2. Results of correlation coefficient 

The correlation coefficient measures the strength of the association between the features statistically. 
The values vary from (-1.0 to 1.0). A computed value greater than 1.0 or less than -1.0 indicates that the 
correlation measurement was incorrect. A correlation of (-1.0) indicates a perfect negative correlation, whereas 
one of 1.0 indicates a perfect positive correlation. A correlation of 0.0 indicates no linear link between the two 
variables' movements. Table 4 represents the correlation coefficient results. 


Table 4. Correlation coefficient results 

Feature Slope Skw. Max Mean  LS.S Kurt Cum Min MSV _ Std MSE APML _APSL 
Slope 1.0 0.0 0.9 0.9 0.5 -0.9 -0.5 1.0 0.0 0.9 1.0 -0.6 1.0 
Skw. 0.0 1.0 0.4 -0.4 0.1 0.3 0.7 -0.1 1.0 0.2 -0.1 0.3 0.0 
Max 0.9 0.4 1.0 0.6 0.6 -0.6 0.0 0.9 -0.4 0.9 0.8 -0.3 0.9 
Mean 0.9 -0.4 0.6 1.0 0.5 -0.9 -0.7 0.9 -0.4 0.7 0.9 -0.7 0.9 
LS. S 0.5 0.1 0.6 0.5 1.0 -0.4 0.1 0.6 0.0 0.7 0.7 0.2 0.5 
Kurt -0.9 0.3 -0.6 -0.9 -0.4 1.0 0.8 -0.9 0.2 -0.7— -0.9 0.8 -0.9 
Cum -0.5 0.7 0.0 -0.7 0.1 0.8 1.0 -0.5 0.6 -0.1 -0.5 0.9 -0.5 
Min 1.0 -0.1 0.9 0.9 0.6 -0.9 -0.5 1.0 -0.1 0.9 1.0 -0.5 1.0 
MSV 0.0 1.0 0.4 -0.4 0.0 0.2 0.6 -0.1 1.0 0.1 -0.1 0.2 0.0 
Std 0.9 0.2 0.9 0.7 0.7 -0.7 -0.1 0.9 0.1 1.0 0.9 -0.2 0.9 
MSE 1.0 -0.1 0.8 0.9 0.7 -0.9 -0.5 1.0 -0.1 0.9 1.0 -0.5 1.0 
APML -0.6 0.3 -0.3 -0.7 0.2 0.8 0.9 -0.5 0.2 0.2 -0.5 1.0 -0.6 
APLS 1.0 0.0 0.9 0.9 0.5 -0.9 -0.5 1.0 0.0 0.9 1.0 -0.6 1.0 
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From Table 4 the results showed the correlation among all the features. In the first row show the 
correlation between the slope and Max, Mean, Min, Std, MSE, and APLS. We selected the higher correlation 
between the features and neglected the lower correlation. The threshold set as 0.6, as shown in Table 5. 


Table 5. Best correlation between the features 
Feature Slope Max Mean __LS.S_— Min Std MES APLS 


Slop 1.0 0.9 0.9 10 0.9 1.0 1.0 
Max 1.0 0.6 0.6 0.9 0.9 0.8 0.9 
Mean 1.0 0.9 0.7 0.9 0.9 
LS.S 1.0 0.6 0.7 0.7 

Min 10 609 1.0 1.0 

Std 1.0 0.9 
MSE 1.0 1.0 
APLS 1.0 


The correlation coefficient of more than 0.6 indicates a strong positive correlation between the 
features. There might be a causal relationship between the two correlated variables. Furthermore, if there is a 
link, it may be indirect. A portion of the variation in one of the variables, as measured by its variance, may be 
attributed to its link with the causes of the other variables. 


3.3. Results of fisher coefficient 

By applying Fisher’s exact test to our scenario, the Fisher Z transformation formula used in this paper 
would compute the following exact probability. The Fisher is necessary in order to enable a comparison of the 
means. The results of Fisher are shown in Table 6. 


Table 6. Fisher coefficient results 

Feature Slope Skew Max Mean LS.S_— Kur Cum __—s Min —MSV___ Std MSE APML __ APLS 
Slope Inf = -0.335.— 0.819 1.6 -0.99 -0.41 0.301 2.901 -039 1401 1.985 0412 5.84 
Skew — -0.34 Inf 0.473 -0.86 -0.31 -0.79 0.742 -0.23 2.911 -0.1 -0.456 0.304 = -0.33 
Max 0.819 = 0.473 Inf 0.313 -1.24 -1.01 092 141 0404 1.751 1.221 0.605 1.34 
Mean 1.379 = -0.86 —(0.313 Inf -0.39 -0 -0.18 1.643 -0.91 0.788 = 1.615 0 1.598 
LS.S — -0.99 -0.305 -1.24 -0.389 = Inf —s 1.138) -1.02) -1.26-0.24 -1.42)--0.881 = -0.93_— -0.99 
Kurt -0.41  -0.793 -1.01 -0.004 1.137 Inf -0.76 -049 -0.77 -0.52 — -0.28 1.1 -0.41 
Cum 0.301 0.742 «30.92 -0.181 -102 -0.76 Inf 0495 0.625 0.717 0.242 1.341 0.303 
Min 2.147 -0.229 0.858 1.043 -1.26 -0.49 0.495 = Inf -0.3 2.195 2.401 0.652 2.890 
MSV -0.39 2.911 0.404 -0.905 -0.24 -0.77 0.625  -0.3 Inf = -0.18 = -0.531 =0.212—-0.39 
Std 1.582 -0.104 0.966 0.788 -1.42 -0.53 0.717 2.195  -0.18 Inf 1.34 0.909 1.391 
MSE 1.985 -0.456 0.596 1.375 -0.88 -0.29 0.242 1.981 -0.53 1.413 Inf 0.42 1.997 
APML 0.413 0.304 0.605 2E-04 -0.94 -0.43 1.541 0.652 0.212 0.909 0.424 Inf 0.416 
APLS _ 5.843 -0.337_ —0.815_—1.378 = -0.99 —-0.41 ~— 0.303, 2.165 -0.4 1.591 1.997 0.416 Inf 


From Table 6, the Fisher coefficient for slope is inf. Moreover, other features have negative and positive 
values. The threshold of this feature value by Selected the positive values of other features and neglected the 
negative values. The results observed the best Fisher coefficient among the features as in Table 7. 


Table 7. Higher Fisher coefficient results 
Feature Mean Min MSV _ Std MSE —APML _ APSL 


Slope 1.6 2.9 1.4 2 5.8 
Skew 2.9 
Max 14 1.7 1.2 1.3 
Mean 1.6 1.6 1.6 
LS.S 
Kurt 1.1 
Cum 4 
Min 2.1 2.4 2.9 
Std 1.3 1.4 
MSE 2 
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From the above three feature algorithm results, we focus on the best features with higher values for each 
algorithm and select them. It is also selected during the other algorithm to reassure all the features selected from 
each method. The feature (Max) was selected in Relief and correlation algorithms, but it is neglected in the Fisher 
algorithm. The feature (Cum) was selected just in the Relief algorithm but neglected during the correlation and 
Fisher algorithm. The summary results of all the feature selection methods are shown in Table 8. 


Table 8. Summary results of the three methods 


Feature Relief Correlation Fisher 
Slope Selected Selected Selected 
Skewness x x x 
Max Selected Selected x 
Mean Selected Selected Selected 
LS slope Selected Selected x 
Kurtosis x x x 
Cum Selected x x 
Min Selected Selected Selected 
MSV XxX Xx Xx 
Std Selected Selected Selected 
MSE Selected Selected Selected 
APML XxX Xx xX 
APSL Selected Selected Selected 


Depending on those results above, we consider the best six features already selected through all three 
algorithms as input for the network (Mean, Std, Slope, Min, MSE, APSL). The result before RCF shows weak 
recognition accuracy of 97.33% and 95.87% for normal shifting (1.5 o-2.8 o) and small shifting less than (1.5 
6), respectively, as shown in Tables 9 and 10. The results obtained after employing feature selection to decrease 
the dimensionality from 13 to six features show good recognition accuracy. Tables 11 and 12 indicate 98.84% 
and 98.14% for normal shift (1.5-2.8) o and small shift less than (1.5) 6, respectively. The correct normal 
pattern recognition was improved when applied features selection from 98% to 100% for the normal shift 
dataset. And from 96.35% to 99.9 for the small shift dataset. 


Table 9. Normal shift (1.5—2.8) sigma with 13 features 
Patten NOR CYC IT DT US DS 


NOR 98 0 1.2 0 0.8 0 

CYC 1.47 (R9g2 0 0 1 0 
IT 0 0 97.76 0 2.23 0 
DT 0 0 0 96.65 0 3.34 
US 0 0 2.72 0 97.27 0 
DS 1 0 0 2.19 0 96.80 


Table 10. Small shift less than (1.5) sigma with 13 features 
Patten NOR CYC IT DT US DS 


NOR 96.35 0 0 1.6 0 2.05 
CYC 0.45 97.55 0 0 1.05 0.95 
IT 1.01 0 96.15 0 2.84 0 
DT 0 0 0 S529) 0 4.71 
US 0 0 5.07 0 94.93 0 
DS 1 0 0 4.01 0 94.99 


Table 11. Normal shift (1.5—2.8) sigma with six features 
Pattern NOR CYC IT DT US DS 


NOR 100 0 0 0 0 0 

CYC 0.47 99.52 0 0 0 0 
IT 0 0 98.76 0 1.23 0 
DT 0 0 0 98.65 0 1.34 
US 0 0 1.72 0 Oh 27/ 0 
DS 0 0 0 2.19 0 97.80 
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Table 12. Small shift less than (1.5) sigma with six features 


Pattern NOR CYC IT DT US DS 

NOR 99.9 0 0 0 0 0.05 

CYC 0.45 99.55 0 0 0 0 
IT 0 0 97.15 0 2.84 0 
DT 0 0 0 97.29 0 2.70 
US 0 0 3.06 0 96.93 0 
DS 0 0 0 2.00 0 97.99 


The above results can note that the network works better when decreasing the input dimensional from 
13 to 6 features, which means the new approach (RCF-ANN) has significantly improved recognition accuracy. 
The classifier can detect small shifts with good recognition accuracy; all these results for developed patterns 
with an average of 10 runs. 


3.4. Average run lengths 

Average run lengths (ARL) are an important evaluation vector in SPC. (ARLO) calculates the stable 
process, which means how long the process is still stable before a false alarm, the large number of ARLO is 
better than a small. (ARL1) for unstable process, it means how many observations are required before the 
correct unstable pattern is recognized. In this paper, the (ARLO) calculated for two different data sets for a 
stable process is equal (315) for normal shifting, and (260) for the small shifting dataset. ARL1 was calculated 
for two data sets (normal and small shifts). In normal shift, the (ARL1) equals (15), but in a small shift dataset, 
it is equal to (15.5). We can compare this work with previous work those used ANN-MLP as a classifier, as 
shown in Table 13. 

We can note from Table 13 the recognition accuracy when using three feature selection algorithms; it 
gives good accuracy for diagnosing the normal pattern with 99.9% accuracy from the rest of the abnormal patterns 
within a small mean shifting data set (less than 1.5) sigma. Previous studies used the mean shifting data set (1.5— 
2.8) sigma. One of the possible reasons for higher recognition accuracy when compared to previous studies, even 
though they used ANN-MLP as a classifier, is that the six features selected for the first time and collected together 
as an input in this study improved recognition accuracy by presenting the raw data well. Using the mixed features 
(statistical and Shape) gives a good representation of data as an input to the network. 


Table 13. Results comparison with previous studies 


Ref. Model oe Optimization Input Bae AC % 
Lu et al. [33] MLP _ Back-propagation - Statistical and shape 8 81.3 
features 
Naeini and Bayati MLP descending new feature belief variable Statistical features 6 97.36 
[6] gradient 
Hassan [31] MLP BFGS Recognition only when Statistical features 6 87.8 
necessary 
This work MLP _ Back-propagation - Statistical and shape 6 98.88 


features 


4. CONCLUSION 

The objective of this paper is to suggest an improved approach for feature selection by using three 
filter algorithms, namely, Relief, correlation, and Fisher. This provides more information than using one 
features selection algorithm to select the best features as input presentation. The ANN-BP was employed as a 
classifier to classify patterns from two datasets that had previously been utilised in research that were compared 
to this work. The first one with data set having a mean shifting between 1.5 o to 2.8 o, and the second data set 
with a small mean shifting (less than 1.5 o). The results show RCF-ANN gave significantly better performance 
and good generalization. Also, we can note that the network has a good recognition accuracy for both the data 
sets and improved the correct recognition accuracy from 97.33% and 95.87% to 98.88% and 98.18% for normal 
and small shift, respectively. The mixed features (statistical and shape) significantly contributed and gave good 
accuracy. Note that some miss classification happened with small shift data because the small shift patterns 
became partly similar to the normal pattern. The average run length, ARLO for the stable process is equal (315) 
for normal process, and (260) for the small shifting dataset, while the ARL1 improved when detecting a small 
shift to 15.5. This experimental result is limited to fully developed patterns. We plan to test the developing 
patterns with moving windows size and apply the different classifiers for future work. 
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